Sowing dates prioritization using Causal Random Forest and Policy Trees: Evidence from Agronomic Trials in Eastern India
Author
Maxwell Mkondiwa
1 Introduction
In this notebook, I use a causal machine learning estimator, i.e., multi-armed causal random forest with augmented inverse propensity score weights (Athey et al 2019), to estimate conditional average treatment effects (CATES) for agronomic practices. These CATEs are estimated for each individual farm thereby providing personalized estimates of the potential effectiveness of the practices. I then use a debiased robust estimator in a policy tree optimization (Athey and Wager 2021) to generate optimal recommendations in the form of agronomic practices that maximize potential yield gains.
CSISA_KVKestim$SowingSchedule=ordered(CSISA_KVKestim$SowingSchedule,levels=c("T5","T4","T3","T2","T1"))#Create dummies for some categorical variableslibrary(fastDummies)CSISA_KVKestim <- fastDummies::dummy_cols(CSISA_KVKestim, select_columns =c("VarietyClass","SoilType","CropEstablishment","Year","District"))
2.1 Graphics
# Bar graphs showing percentage of farmers adopting these practices library(tidyverse) library(ggplot2) bar_chart=function(dat,var){ dat|>drop_na({{var}})|>mutate({{var}}:=factor({{var}})|>fct_infreq())|>ggplot()+geom_bar(aes(y={{var}}),fill="dodgerblue4")+theme_minimal(base_size =16) } sow_plot=bar_chart(CSISA_KVKestim,SowingSchedule)+labs(y="Sowing dates") sow_plot
library(ggpubr) library(tidyverse) #Sowing dates SowingDate_Options_Errorplot= CSISA_KVKestim%>%drop_na(SowingSchedule) %>%ggerrorplot(x ="SowingSchedule", y ="GrainYield",add ="mean", error.plot ="errorbar", color="steelblue", ggtheme=theme_bw())+labs(x="Sowing date options",y="Wheat yield (t/ha)")+theme_minimal(base_size =16)+coord_flip()
Warning: The `fun.y` argument of `stat_summary()` is deprecated as of ggplot2 3.3.0.
i Please use the `fun` argument instead.
i The deprecated feature was likely used in the ggpubr package.
Please report the issue at <https://github.com/kassambara/ggpubr/issues>.
Warning: The `fun.ymin` argument of `stat_summary()` is deprecated as of ggplot2 3.3.0.
i Please use the `fun.min` argument instead.
i The deprecated feature was likely used in the ggpubr package.
Please report the issue at <https://github.com/kassambara/ggpubr/issues>.
Warning: The `fun.ymax` argument of `stat_summary()` is deprecated as of ggplot2 3.3.0.
i Please use the `fun.max` argument instead.
i The deprecated feature was likely used in the ggpubr package.
Please report the issue at <https://github.com/kassambara/ggpubr/issues>.
SowingDate_Options_Errorplot
2.2 Descriptives
library(fBasics)summ_stats <- fBasics::basicStats(CSISA_KVKestim[,c("GrainYield","VarietyClass_LDV","SoilType_Heavy","SoilType_Medium","SoilType_Low","CropEstablishment_CT","CropEstablishment_CT-line","CropEstablishment_ZT")]) summ_stats <-as.data.frame(t(summ_stats)) # Rename some of the columns for convenience summ_stats <- summ_stats[c("Mean", "Stdev", "Minimum", "1. Quartile", "Median", "3. Quartile", "Maximum")] %>%rename("Lower quartile"='1. Quartile', "Upper quartile"="3. Quartile") summ_stats
GRF forest object of type regression_forest
Number of trees: 2000
Number of training samples: 3628
Variable importance:
1 2 3 4 5 6 7 8 9 10 11 12 13
0.641 0.002 0.001 0.000 0.002 0.000 0.001 0.101 0.008 0.009 0.040 0.036 0.158
varimp.multi_sowing <-variable_importance(Y.multi_sowing.forest)Y.hat.multi.all_sowing <-predict(Y.multi_sowing.forest, estimate.variance =TRUE)$predictionsmulti_sowing.forest <-multi_arm_causal_forest(X = X_cf_sowing, Y = Y_cf_sowing, W = W_cf_sowing ,W.hat=W.hat.multi.all_sowing,Y.hat=Y.hat.multi.all_sowing,seed=2) varimp.multi_sowing_cf <-variable_importance(multi_sowing.forest)multi_sowing_ate=average_treatment_effect(multi_sowing.forest, method="AIPW")
Warning in get_scores.multi_arm_causal_forest(forest, subset = subset, debiasing.weights = debiasing.weights, : Estimated treatment propensities take values very close to 0 or 1 meaning some estimates may not be well identified. In particular, the minimum propensity estimates for each arm is
T5: 0 T4: 0.004 T3: 0.002 T2: 0 T1: 0
and the maximum is
T5: 0.893 T4: 0.873 T3: 0.834 T2: 0.922 T1: 0.941.
Warning: Using the `size` aesthetic in this geom was deprecated in ggplot2 3.4.0.
i Please use `linewidth` in the `default_aes` field and elsewhere instead.